{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "nbsphinx": "hidden"
   },
   "source": [
    "This notebook is part of the `kikuchipy` documentation https://kikuchipy.org.\n",
    "Links to the documentation won't work from the notebook."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Multivariate analysis\n",
    "\n",
    "See\n",
    "[HyperSpy's user guide](http://hyperspy.org/hyperspy-doc/current/user_guide/mva.html)\n",
    "for explanations on available multivariate statistical analysis (\"machine\n",
    "learning\") methods and more examples of their use."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Denoising EBSD patterns with dimensionality reduction\n",
    "\n",
    "Let's use principal component analysis (PCA) followed by dimensionality\n",
    "reduction to increase the signal-to-noise ratio $S/N$ in a small Nickel EBSD\n",
    "data set, here called denoising. This denoising is explained further in\n",
    "<cite data-cite=\"aanes2020processing\">Ånes et al. (2020)</cite>."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Exchange inline for notebook or qt5 (from pyqt) for interactive plotting\n",
    "%matplotlib inline\n",
    "\n",
    "import hyperspy.api as hs\n",
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "import kikuchipy as kp\n",
    "\n",
    "s = kp.data.nickel_ebsd_large(allow_download=True)  # External download\n",
    "s"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's first increase $S/N$ by removing the undesired static and dynamic\n",
    "backgrounds"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "s.remove_static_background()\n",
    "s.remove_dynamic_background()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Followed by averaging each pattern with the eight nearest patterns using a\n",
    "Gaussian kernel of $\\sigma = 2$ centered on the pattern being averaged"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "s.average_neighbour_patterns(window=\"gaussian\", window_shape=(3, 3), std=2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We use the average image quality (IQ) and the IQ map to assess how successful\n",
    "our denoising was. Let's inspect these before denoising"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "iq1 = s.get_image_quality()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(iq1.mean())\n",
    "\n",
    "plt.imshow(iq1, cmap=\"gray\")\n",
    "plt.tight_layout()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The basic idea of PCA is to decompose the data to a set of values of linearly\n",
    "uncorrelated, orthogonal variables called principal components, or component\n",
    "factors in HyperSpy, while retaining as much as possible of the variation in the\n",
    "data. The factors are ordered by variance. For each component factor, we obtain\n",
    "a component loading, showing the variation of the factor's strength from one\n",
    "observation point to the next.\n",
    "\n",
    "Ideally, the first component corresponds to the crystallographic feature most\n",
    "prominent in the data, for example the largest grain, the next corresponds to\n",
    "the second largest feature, and so on, until the later components at some point\n",
    "contain only noise. If this is the case, we can increase $S/N$ by reconstructing\n",
    "our EBSD signal from the first $n$ components only, discarding the later\n",
    "components.\n",
    "\n",
    "PCA decomposition in HyperSpy is done via\n",
    "[singular value decomposition (SVD) as implemented in scikit-learn](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html#sklearn.decomposition.PCA).\n",
    "To prevent number overflow during the decomposition, our detector pixels data\n",
    "type must be of the float or complex type"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dtype_orig = s.data.dtype\n",
    "s.change_dtype(\"float32\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To reduce the effect of the mean intensity per pattern on the overall variance\n",
    "in the entire dataset, we center the patterns by subtracting their mean\n",
    "intensity before decomposing. This is done by passing `centre=\"signal\"`.\n",
    "Considering the expected number of components in our small Nickel data set,\n",
    "let's keep only 100 of the ranked components"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "n_components = 100\n",
    "s.decomposition(\n",
    "    algorithm=\"SVD\",\n",
    "    output_dimension=n_components,\n",
    "    centre=\"signal\",\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "s.change_dtype(dtype_orig)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can inspect our decomposition results by clicking through the ranked\n",
    "component factors and their corresponding loading"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "s.plot_decomposition_results()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "nbsphinx-thumbnail": {
     "tooltip": "Denoising of EBSD patterns by principal component analysis and dimensionality reduction"
    },
    "tags": [
     "nbsphinx-thumbnail"
    ]
   },
   "outputs": [],
   "source": [
    "factors = s.learning_results.factors  # (n detector pixels, m components)\n",
    "sig_shape = s.axes_manager.signal_shape[::-1]\n",
    "loadings = s.learning_results.loadings  # (n patterns, m components)\n",
    "nav_shape = s.axes_manager.navigation_shape[::-1]\n",
    "\n",
    "fig, ax = plt.subplots(ncols=2, figsize=(10, 5))\n",
    "ax[0].imshow(loadings[:, 0].reshape(nav_shape), cmap=\"gray\")\n",
    "ax[0].axis(\"off\")\n",
    "ax[1].imshow(factors[:, 0].reshape(sig_shape), cmap=\"gray\")\n",
    "ax[1].axis(\"off\")\n",
    "fig.tight_layout(w_pad=0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also inspect the so-called scree plot of the proportion of variance as a\n",
    "function of the ranked components"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "_ = s.plot_explained_variance_ratio(n=n_components)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The slope of the proportion of variance seems to fall after about 50-60\n",
    "components. Let's inspect the components 60-64 for any useful signal"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, ax = plt.subplots(ncols=4, figsize=(15, 5))\n",
    "for i in range(4):\n",
    "    factor_idx = i + 59\n",
    "    factor = factors[:, factor_idx].reshape(sig_shape)\n",
    "    factor_iq = kp.pattern.get_image_quality(factor)\n",
    "    ax[i].imshow(factor, cmap=\"gray\")\n",
    "    ax[i].set_title(f\"#{factor_idx}, IQ = {np.around(factor_iq, 2)}\")\n",
    "    ax[i].axis(\"off\")\n",
    "fig.tight_layout()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It seems reasonable to discard these components. Note, however, that the\n",
    "selection of a suitable number of components is in general difficult."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "s2 = s.get_decomposition_model(components=59)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "iq2 = s2.get_image_quality()\n",
    "iq2.mean()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, ax = plt.subplots(ncols=2, figsize=(15, 4))\n",
    "im0 = ax[0].imshow(iq1, cmap=\"gray\")\n",
    "ax[0].axis(\"off\")\n",
    "fig.colorbar(im0, ax=ax[0], pad=0.01, label=\"IQ before denoising\")\n",
    "im1 = ax[1].imshow(iq2, cmap=\"gray\")\n",
    "ax[1].axis(\"off\")\n",
    "fig.colorbar(im1, ax=ax[1], pad=0.01, label=\"IQ after denoising\")\n",
    "fig.tight_layout(w_pad=-10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We see that the average IQ increased from 0.30 to 0.34. We can inspect the\n",
    "results per pattern by plotting the signal before and after denoising in the\n",
    "same navigator"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "hs.plot.plot_signals([s, s2], navigator=hs.signals.Signal2D(iq2))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
